A model-free method to learn multiple skills in parallel on modular robots

Legged robots are well-suited for deployment in unstructured environments but require a unique control scheme specific for their design. As controllers optimised in simulation do not transfer well to the real world (the infamous sim-to-real gap), methods enabling quick learning in the real world, without any assumptions on the specific robot model and its dynamics, are necessary. In this paper, we present a generic method based on Central Pattern Generators, that enables the acquisition of basic locomotion skills in parallel, through very few trials. The novelty of our approach, underpinned by a mathematical analysis of the controller model, is to search for good initial states, instead of optimising connection weights. Empirical validation in six different robot morphologies demonstrates that our method enables robots to learn primary locomotion skills in less than 15 minutes in the real world. In the end, we showcase our skills in a targeted locomotion experiment.


Appendix overview
Mathematical Analysis Appendix A Our mathematical analysis shows that we can always solve the ODE of our SO2-oscillator in its canonical form.Resulting in a description of the neuron's state value i as a (set of) cosine(s) (s i (t) = cos i ) Equation A13.When connecting multiple SO2-oscillators, the canonical form changes such that each neuron state function contains a linear combination of modulated cosines (s i (t) = ∑ |Ni| n a n cos n ), where the total number of cosines is the same as the total number of oscillators in the CPG-network (|N i |) Equation A24.Adding inter-connection effects the cosines as follows: frequencies are pushed toward zero and higher values (as a function of their (inter-)connection strength, and number of inter-connections, Figure A3), amplitudes become a ratio over frequencies and different phase offsets are introduced.Overall, network structure (i.e., interconnections) changes the distribution of cosine sums in each of the oscillator states; connection strength defines the frequency, amplitude, and phase offset; initial states define the amplitude and phase offset. 1 CPG-network optimisation Appendix B This analysis shows that weight optimisation for inter-connected CPGs is difficult, insofar as weight adaptation changes the network behaviour in three distinct ways, which, in turn, produces many local optima.Preliminary optimisation experiments -where we try to reproduce the network state time-series of a randomly generated CPG-network-support this analysis.Here, we found that weight optimisation can reproduce the same signal with different sets of weights and frequencies Figure B4.This shows that the frequency content is not that important when we attempt to optimise CPG-networks.Therefore, rather than optimising the weights in our CPG-network (conventional practice in the literature), we can optimise the initial state with random weights (see Figure B5).Here, we find that, in terms of quality, ISO and WO perform similarly, while in terms of efficiency, ISO finds a solution much faster.All in all, these results generate an important insight: Even without the correct frequency content in the CPG-network, ISO can approximate a time-series faster than WO with the same efficacy.

ISO analysis Appendix C
We provide additional analysis to understand the ISO algorithm performance.First, a Hyper parameter sweep provides optimal distribution between trial-time, window length, and test time (section C.A).Secondly, we test the capabilities of ISO to learn many skills in parallel (section C.B).Here, we show the ability to learn up to 100 different skills in parallel without loss of performance.Lastly, we showcase parallel skill learning on 20 robots for 6 different skills (section C.C).

Extended result Appendix D
A collection of additional analysis on the performance of our ISO algorithm with respect to WO.We present the real-world behaviour of our algorithms with corresponding video material.Here, ISO outperformed 11/18 times WO* in less 5% of training time.Additionally, we zoom in on the overall robustness of our algorithms with respect to the effect of the simto-real transfer on the overall performance.Here it shows that ISO is more robust than WO* when it comes to the number of times a performance drop was measured (10/18 vs. 17/18).Additionally, the effect size of the sim-to-real gap is smaller for ISO compared to WO* (-2.32% vs. -57.33%).Finally, we propose a feedback structure that continuously selects skills in a targeted locomotion task.

Appendix A Mathematical analysis of CPG-network models
To get a good understanding on the influence of structure, weights, and state-initialisation we will analyze our CPG-network in two exemplary ways: 1) a single CPG oscillator, 2) two coupled harmonic oscillators.In Figure A1 we show how we can rewrite our CPG model into a Linear Time Invariant (LTI) system of ordinary equations, where we disregard the readout neuron.Please note that we follow the coupling of weights expressed in the methodology and simplify the oscillator weights w xiyi and w yixi to ±w.
We can add and connect a second oscillator to our network by extending our weight matrix as shown in Figure A2.Please note that we follow the coupling of weights, expressed in the methodology, by simplifying the inter-connection weights c xixj and c xj xi to ±c.The LTImatrix representations correspond fully with the CPG ODEs.In section A.A we will derive an analytical solution to a single CPG, and section A.B present the analytical solution for two inter-connected oscillators.
Fig. A1: Different representations of the same CPG.

A.A Analytical solution of a single CPG
Proposition: The state of a single CPG oscillator in time can be described as a set of cosines.
Solving the EOM shows that, no matter what set of weights, an SO2-oscillator will always contain purely imaginary eigenvalues in its canonical form, which proofs the existence of elliptic fixed point(s).The network weights define the frequency, and initial states define the amplitude and phase-offset.
From now on we will use Newton's dot notation to denote derivatives with respect to time.A single oscillator can be described by the following LTI-matrix.
Rewriting gives us the following differential equation to solve: ṡ − As = 0 (A6) This is a first-order differential equation that can be written as follows using the Taylor series expansion: Here, s 0 indicates the initial state of the CPG (i.e.x,y−neuron).Properties of this power series are derived more in depth here [? ].We show that this expansion is true by taking the derivative of Equation A7: Equation A7 is still confusing, as it is not clear what this infinite series entails for our CPG state over time.Fortunately, we can diagonalize A through its eigendecomposition A = T ΛT −1 with Λ being a zero matrix with the eigenvalues on its diagonal and T being its corresponding eigenvectors.Rewriting Equation A7 leads to [? ]: If we solve the factorisation, we find the eigenvalues for our single oscillator: We obtain two purely imaginary numbers as our eigenvalues.To obtain the eigenvectors we need to solve (A − Iλ)V = 0 for both eigenvalues: Filling in T ΛT −1 in Equation A9 to obtain s(t) for a given initial state s 0 = [x 0 , y 0 ] T : Euler's formula show: e −iwt + e iwt = 2 cos wt, e −iwt − e iwt = 2i sin wt

= [
x 0 cos wt + y 0 sin wt Thus, for a single CPG the analytical solution of both neurons' state is: The analytical solution shows that we always obtain two pure cosines because no matter what matrix weights are chosen the eigenvalues are always purely imaginary, which proofs the existence of elliptic fixed point(s).Elliptical fixed points produce infinitely many quasiperiodic orbits as there is no 'damping' present in the system (meaning | ṡ(t)| = | ṡ(t + n)|∀n ∈ R) that would asymptotically convergences towards an attractor.Furthermore, we can clearly see how the design variables weights and initial state effect the network behaviour.The weight in the network define the frequency (w), and the initial state of the network define the amplitude ( √ (x 0 ) 2 + (y 0 ) 2 ) and phase-offset (± arctan x 0 y 0 ).

A.B Analytical solution of two inter-connected CPGs
Proposition: The states of two coupled CPG oscillators can be described as a sum of two cosines with distinct frequencies.
Solving the EOM shows that, no matter what combination of weights in two coupled oscillators, the Canonical form will always contain two purely imaginary eigenvalues (or less).Connecting the two oscillators induces abrupt changes in their dynamics, where both cosines appear in both oscillators' states, with weights defining the cosines' frequency amplitude and offset; and initial state defining the amplitudes and phase-offsets.
To analytically solve the EOM for two inter-connected neuron pairs we extend our LTI matrix description as shown in Figure A2.We proceed with the same steps as the single oscillator case.
The eigenvalues for our coupled oscillators are defined as.
Using the quadratic-formula to solve for p: where Let's take a moment to assess what we have for a bit.We split our quadratic-formula into two distinct parts: C (which is dependent on the interconnection c) and W (which is solely dependent on the weights).Equation A17shows that for p = λ 2 .Now, we can already get some intuition on the eigenvalues.
W ± provides two instances of W for which the following is true W − ≤ W + for all combinations of w 1 , w 2 .In the most general case, where both oscillator weights are nonzero (w 1 ≠ 0 ∧ w 2 ≠ 0), we see that W − is strictly smaller than W + (W − < W + ) which means that p will always be strictly negative due to the discriminant being smaller than the non-discriminant: This result shows that whatever values for C and W − < W + we use, our SO2-oscillator pair in its canonical form contains a set of purely imaginary eigenvalue pairs λ.Thus, our controller behaviour can always be described as a set of pure cosines.
Additionally, for the special case where 'either one, but not both' oscillator weights are equal to zero (w This shows that the resulting dynamics reduce to a single SO2-oscillator dependent on the amplitude of the interconnection and remaining weight In fact, the LTIdescription of our interconnected oscillators can be rewritten (by row-reduction) to a single oscillator with zeros at the remaining LTI-matrix indices, independent of which weight is set to zero.It is important to note that this reaffirms our previous finding that any type of SO2oscillator can be defined as a set of cosines in its canonical form, where this special case reduces the number of cosines to a single oscillator.
For C we can look at two other special cases c approaches zero (c → 0) or where c approaches infinity (c → ∞): Just like Equation A18, we again obtain two purely imaginary numbers as our eigenvalues when c approaches zero.But now our oscillator weights dominate the behaviour cosines as if they were two separate oscillators.This is not surprising as c → 0 implies that we almost remove the interconnection between the two base oscillators from Figure A2 and end up with two independent instances of the singular CPG defined before (Equation A3) for which we already analytically solved the equations of motion (in Section A.A). Now for c → ∞: This result shows a dependency of the two coupled oscillator frequencies on the relative size of c with respect to w 1 , w 2 .If the interconnection weight dominates the oscillator weights, then the interconnected x-neurons start to behave as a single CPG together.This result resembles a singular CPG with frequency c between the two x−neurons, because of the negligible effects of the two surrounding y−neuron weights.The results also show what happens when both oscillator weight are zero (w 1 = 0 ∧ w 2 = 0) and only connection c is non-zero.
The eigenvalue analysis reveals that for any type of interconnected SO2-oscillator the resulting behaviour can be described by its canonical form as a set of pure cosines.The weights in this aspect influence the frequency content non-linearly, where special cases with weights equal zero will abruptly change the dynamical system as a whole (i.e.reducing the number of cosines, or creating single oscillators).Now, for the more general case where the influence of both c and w are not negligible.Here we know that the eigenvalues are purely imaginary pairs (from Equation A18).For the sake of clarity, we refer to these eigenvalues as ±λ 1 i and ±λ 2 i while deriving our eigenvectors.The i notation is there to remind us of the imaginary component.To obtain the eigenvectors we need to solve (A − Iλ)V = 0 for all eigenvalues: If we fill in Equation A9 we obtain s(t) for a given initial state Solving the EOM for two inter-connected CPGs shows that the resulting behaviour constitutes two independent cosines.This result is similar to the single CPG case with some distinct features.In the inter-connected case, the cosines from the canonical are now present in all neuron states, resulting in a linear combination of two cosine signals inside the analytical solution for each neuron state.For the initial state we can again see an effect on the amplitude and phase-offset of the cosines present in the neuron states.Equation A24shows that (inter-connected) weights also influence amplitude and phase-offset in addition to their frequency content.
The eigenvalue analysis shows that weights affect frequencies non-linearly with special cases where weights equal zero inducing abrupt changes in the dynamical system as a whole.The latter gives us insight in what changes in network-structure do the behaviour of the CPGs (meaning allowing (inter-)connections to exist).An important finding to emphasise is that adding inter-connection does not alter the number of cosines but does influence frequency content non-linearly.Network structure thus defines what oscillator pairs exhibit the same frequency content (i.e., frequency distribution), more on this in the following section.

A.C Extending to multiple inter-connected oscillators
Proposition: The states of n coupled CPG oscillators can be described as a sum of n cosines with distinct frequencies.Empirically we find that, no matter what structure of inter-connectivity of n coupled oscillators, the Canonical form will always contain in n purely imaginary eigenvalues (or less).The structure of the CPG-network defines which cosines are summed together in which CPG state (i.e., the distribution of cosines).Network structure thus effects cosine distribution, frequency content, amplitude, phase-offset in the networks' states.
To analyze the effect of optimising network structure we first need to define what we see as network structure.If we take several independent oscillators, we can allow them to interact through interconnecting their respective x-neurons.Whether such an inter-connection exist dramatically alters the behaviour of the CPG-network as exemplified in the two coupled oscillator case (section A.B). Defining where inter-connections exist (or even where oscillators are formed) is a structural parameter that is separate from just changing weights (even though a connection weight of 0 can be seen as a part of weight optimisation).For multiple inter-connected CPGs, the mathematics become too tedious.Nevertheless, we can reason and empirically proof the truthfulness of our proposition.
From Equation A16 it is important to note that the discriminant pushes the frequencies away from the non-discriminant by the ± sign.The size of the ± perturbation is given by the discriminant.The value of the discriminant tends to approximate the non-discriminant if the relative influence of inter-connections is bigger than the oscillator weights.This effect shows that, when coupling two oscillators, with a relatively strong inter-connectivity the frequency content tends towards 0 and 2× the non-discriminant value.In the extreme case, (c → ∞) the discriminant collapses one eigenvalue to 0 resulting in one dominant frequency equal to c as shown in Equation A16.
In the case of multiple inter-connected oscillators, a similar effect occurs.More interconnectivity increases the non-discriminant perturbation.We show this behaviour of the eigenvalues empirically by generating 10,000 random CPG-networks containing 10 oscillators with different inter-connectivity densities of [0., 0.3, 1.0].Networks weights are uniformly sampled [-1, 1] for each density to obtain the CPG-frequency densities through eigendecomposition (we used numpy.linalg.eig()),see Figure A3.
Eigenvalues (im) In the absence of inter-connection between oscillators (d = 0 in blue) we get a uniform distribution of eigenvalues which is the same as our sampled weights distribution [-1, 1].This is as expected since d = 0 means no inter-connections, thus our network acts as a set of independent oscillators with eigenvalues equal to the internal oscillator weights.When we add all inter-connections (d = 1.0 in green) to the CPG-network we find a high density around zero.This is because the ± non-discriminant perturbs the eigenvalues towards 0 and 2× the non-discriminant value as explained in section A.B.In the fully inter-connected case d = 1, we can see a pattern emerge with higher densities around certain eigenvalues , which is out of scope for this analysis.With the intermediate inter-connectivity density (d = 0.3) we find both extremes.This is because non-connected CPGs act as single oscillators, while connected CPGs form sub-networks of inter-connected behaviour.
On the nature of our CPG, we would like to emphasise the following.The dynamical analysis of the SO2-oscillator reveals that our system contains only elliptic fixed points (i.e.purely imaginary).The behaviours of our system are therefore marginally stable, which is also implied by the fact that the total energy in the system is conserved Energy conservation also reveals our system to be non-dissipative, which exclude the existence of possible attractors, i.e. limit cycles.The oscillations in our system are periodic, when the randomly sampled weights produce only rational eigenvalues (resulting in a single full period).Whenever at least 1 eigenvalue is irrational the CPG-network can not have a full period, resulting in quasi-periodic oscillatory behaviour.We conjecture the latter to be more likely as the set of irrational numbers is uncountable infinite vs. countably finite rational numbers (this requires further analysis).Quasi-periodicity means that there are infinitely many orbits, which are oscillatory but not closed (states are only visited ones).Nevertheless, such systems often exhibit short-term semi-periodic oscillatory behaviour with regular subspace regions where the motion is periodic [? ].As we only use a sub-space for robot control (we only read out a sub-space of the network), our CPG-network is suitable for (periodic) locomotion skills.

Appendix B CPG-network optimisation
Our CPG-network analysis demonstrates that the number of different cosines in the neuron states are equal to the number of oscillators involved in an inter-connected structure (as follows from Equation A23).Furthermore, their relationship is a linear sum of each individual cosine signal where both the amplitude and offsets are dependent on the weights.This means that inter-connectivity considerably complexifies the effect of weights on the CPG behaviour.Rather than only the changing frequency content of a corresponding oscillator, weight adaptation now alters the amplitudes, offsets, and frequencies of the respective-and all the connected oscillators as well.Such an effect will likely result in many local optima during optimisation.

B.A Optimising weights
CPG optimisation aims to approximate a time-series with the states (or a subset) of the neurons inside the network.In robotics this time-series often produces the optimal motor inputs for a certain robot behaviour.We want to investigate the property of the CPG to have many local minima with respect to weight optimisation.In order to do so, we create a target CPGnetwork consisting of eight oscillators and random inter-connection density (d = 0.3), where all the weights are uniformly sampled between [-1, 1].Initial states of all neurons are set to 1 2 √ 2. The target network generates a time-series for 60 s (dt = 0.05).
We want to approximate this time-series by optimising a proposal CPG-network.This network has the same structure and initial state as the true network, but with randomly initiated weights (U[−1, 1]).For weight optimisation we used differential evolution, RevDE [? ] (with the following parameters: population size λ = 90, top-sample size µ = 30, scaling factor F = 0.3, crossover rate CR = 0.9, generations N gen = 100).See the main methods for a detailed description of the algorithm.We calculate the fitness of a network as the negative summed absolute error of the time-series produced by our proposal network and the true network.We repeat the experiment 30 times N rep = 30 using the same target network but different initial proposal network (different in terms of random weights not structure).The advantage of this optimisation problem is that we have access to the true CPG-network eigenvalues that generated the time-series.This allows us to check if WO consistently converges to the eigenvalue distribution of the true network.The aggregated distributions over all 30 repetitions are presented in Figure B4.
We see that initially the proposal network has a similar distribution as a randomly interconnected network with density d = 0.3 (shown in Figure A3a).At the end of WO, the eigenvalue distribution of the best proposal network (Orange) approximates the target network (Green).Nevertheless, we find a high density of incorrect eigenvalues despite the fact that performance was similar across all the repetitions.Furthermore, at the end of each run we find different sets of weights with similar performance (shown in orange in Figure A3).The mean value across 30 runs therefore averages around 0, with a high Standard Error (black lines).Overall, this means that we found many solutions with different set of weights and eigenvalues that perform similarly well across different runs.This indicates that for our time-series approximation experiment WO contains many local minima.

B.B Optimising initial states
The results shown Figure B4 indicate that WO might provide an unnecessary difficult search space, insofar as it shows that the target time-series can be approximated with different sets of weights even if the proposal network's structure is the same (i.e., many local optima).It is therefore worth investigating whether initial state optimisation (ISO) provides a better search space for optimisation.We repeat the previous experiment in which we try to approximate the same time series produced by the target CPG-network.In addition to WO, we now optimise the initial state of a proposal network with fixed random weights, using RevDE with the same parameter values (N rep = 30).The mean absolute error as a function of generations is shown in Figure B5.
The results show that ISO improves rapidly on our time-series problem and subsequently plateaus around 0.0586 around generation 10.From the start, ISO outperforms WO up until generation 35.WO plateaus shortly after with a similar final performance of around 0.0573.A student t-test on the final mean values indicate no statistically significant differences between state and weight optimisation (p > 0.05).This shows that ISO can deliver similar performance with fewer samples.Consequently, if we can find the proper offset and amplitude of our cosines, then we can sufficiently approximate a time-series without the correct frequency content.

Appendix C ISO analysis
Bootstrapping and Scalability are the two main principles that provide an additional advantage of ISO over WO.In the section, we analyse both in depth.Furthermore, we test the modelagnosticism of our algorithm on a test suite of 20 robots for 6 different skills in parallel [? ].

C.A Hyper parameters
Bootstrapping is enabled by the additional intermediate-state samples obtained during a trial.This provides us with additional hyper-parameters to tune for with ISO as the longer the trial time, the more samples we get at the cost of increasing experiment time.Furthermore, we want to analyse how this performance is related to target time-series length.
The results of Figure C6 show a tendency for smaller target windows (W ) to have higher performance (i.e.lower mean absolute error).This is not surprising, as the length of the target time-series increases the difficulty of the search.Interesting to note is that increased trial time does not necessarily lead to increased performance.Increasing trial time leads to additional bootstrapped samples, as we are able to use more intermediate states, which should lead to faster convergence as every single trial contains more sub-states.This dynamic is visible up to a trial time twice as long as the window time (row 1, 1.25, 2) but plateaus around 5. Finally, number of trials increases performance, which is not surprising since the more samples the higher chance you get to find a 'good' initial state.It should be noted that there is a diminishing return in performance efficiency with respect to total experiment time.Where 1000 trials do not entail twice the performance of 500 trials.

C.B Scalability
Parallelism can greatly improve ISO efficiency when learning multiple skills at once.This is possible because the trials are independent of each other.Our mathematical analysis shows the existence of infinitely many semi-periodic orbit in our CPG-network, due to the elliptic We compare efficacy to learn 'n' time-series in parallel (denoted T n ).
stronger solution for each skill.Thus, all skills have the same probability of being improved in a trial.Per mathematical definition of ISO, learning time should be constant as we evaluate all the skills at once for a constant number of trials.Nevertheless, we test this hypothesis, with the random time-series experiment.We reason as follows: "learning different 'skills' is, in essence, learning different motor input time-series".The benefit of the time-series approximation task is that all time-series have the same performance metric (Mean Absolute Error), which makes comparison of 'constant learning time' possible.In contrast, comparing how well a robot turns vs how well it jumps can not reveal such a thing.We assume that the constant learning time found during the 100 time-series experiments, will hold with different robot skills as well.In the real world, we suspect that motor inputs are correlated between skills, making the parallel learning task even more efficient than presented here.

C.C Model-agnostic multi-skill learning on 20 robots
We test our ISO multi-skill learning algorithm on a test suite of 20 robots (similar to the robots used in the current paper [? ]), with 3 additional skills in parallel (6 skills in total): Jump ( ↑ ) and sideways 'crab walk' to the left and right (facing the same direction) (△ ← , △ → ).↑ is defined as the highest position in z-direction during the last 60 seconds.△ ← , △ → are defined by the lateral movement in positive/negative y-direction minus the squared heading.
The average performance per robot are shown below Figure C8 (N rep = 30 ± SE).For readability we normalised the results by the maximum mean performance per skill: →: 5.26 cm/s, ⟲: 0.28 rad/s, ⟳: 0.30 rad/s, ↑ : 39.5 cm, △ ← : 1.83, △ → : 1.63.ISO is able to learn all 6 skills in parallel, with similar efficacy in the original 3 skills as found in the main text.Furthermore, there seems to be no dependency on the efficacy per morphology or number of servomotors, indicating the model-agnosticism of ISO to learn on different morphologies.

D.B Reality Gap
Simulation and real-world performance values are presented in Table D1.For ISO, on average, ten out of eighteen experiments dropped in performance in the real world (indicated by red numbers in the real column), with an overall average sim-to-real performance drop of -2.32%.For WO* seventeen out of eighteen skills dropped in performance in the real world, with an average sim-to-real performance drop of -57.33%.

D.C Adding feedback
Adding feedback on open-loop control requires careful consideration as dynamics can change due to additional noise and 'closing the loop'.Nevertheless, feedback is fundamental for consistent performance, and completing more complex tasks that require multiple skills.We extend our controller with additional feedback, obtain such high-level control using the learned skills during real-world ISO.Our approach in inspired by a biological perspective, where open-loop control patterns are used within a control hierarchy.The open-loop skills (like our CPG-controller) are preprogrammed motor primitives that are encoded in the M1 cortex and CPG spinal cords [? ].Low-level feedback control occurs locally in the lower centres of the central nervous system/musculoskeletal system; and is responsible for consistent movement execution under perturbations and noise, e.g.uneven terrain; Initialisation and switching of skills is governed through abstract functional planning in higher-level brain centres like the frontal cortex.We designed our feedback control mechanism for our robots with a similar approach.Here, openloop is obtained through ISO, while proper movement execution i.e. lower-level feedback control is done locally ('outsourced' to the servos' internal PID controller).To 'select' and initiate skills, we extend our CPG-network with a sophisticated higher-level controller that provides signals to the CPG-network.We carefully, designed this mechanism to not disturb the open-loop marginally stable dynamics (section A.C) of the network (Figure D10).
For the higher level skill-switch commands we monitor the open-loop target states (i.e.x-neuron signals that are sent from the CPG-network) and transition the CPG (within and) between skills, when states are near identical.This means that we only perturb the hidden y-neuron states of the CPG network as the x-neurons overlap at that time.The resulting We deploy this controller for a targeted locomotion task in simulation, on the Spider with the best real-world ISO controller (the skills are shown in Figure D9: Spider ISO).The robot is initialized at the origin and needs to reach a random point in space with 4m distance.In the first minute, we enforce only the →skill to assess the average heading direction.Afterwards, we implement a simple heading correction loop, where the target heading is defined by the vector between the target point (green) and the current robot position.When the current heading deviates ±0.75rad from the target heading, we induce a skills switch.
Fig. A3: Density plot of the distribution of eigenvalues for a random CPG with 10 oscillators.Inter-connected density (d) indicates the percentage of inter-connections between x-neurons in the network.Weights and connections are randomly sampled from a uniform distribution between [-1,1].
Fig.B4: WO finds a wide distribution of eigenvalues with varying weights across 30 different runs.a) Distribution of: Green, the true eigenvalues of the target CPG-network to be approximated; Blue, the initial eigenvalues of the random networks at the start of an evolutionary run; Orange, the best CPG-network's Eigenvalues after weight optimisation.b) True weights of the target network (in green), with the end-values of the best weights after optimisation (in orange).Error bars indicating standard error.
Fig. B5: Initial State optimisation (ISO, blue) outperforms Weight optimisation (WO, orange) in time-series approximation experiment.The lines indicate the show the mean performance curves over 30 independent runs (N rep = 30 ± SE).
Fig. C7: ISO scalability tests learning curves (mean N rep = 30, highlights indicate ±SE.).We compare efficacy to learn 'n' time-series in parallel (denoted T n ).

Fig. D10 :
Fig. D10: Flow chart of our feedback control structure Fig. D11: (left) Neuron states of a single CPG-oscillator.Blue line is the unperturbed openloop control for →skill; Orange the feedback control with skill switch.Vertical lines indicate a perturbation with red presenting a skill switch.(right) Robot state during targeted locomotion task (blue is start green is target position).Gray line shows the trajectory of the robot with red vectors indicating its estimated heading position during a skill switch.

Table D1 :
Simulator and corresponding real world performance.Colors indicate an increase/decrease with respect to simulated performance (red/green respectively).Final column indicate the average performance drop per skill among all robots, with an underline indicating a statistically significant difference between ISO and WO* (p << 0.05 for all skills)